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ABSTRACT 

A formal analysis is presented of 4he effects of item 
deletion on equating/scaling functions and reported score 
distributions. The phrase "item deletion" refers t* the process of 
changing the original key of a flawed item to either all options 
correct, including omits, or to no options correct, i.e., not scoring 
the flawed^ item. There are two aspects to the present analysis. The 
first aspect is analytical, focusing on the development of a formal 
model for the item deletion effect by decomposing it into its 
constituent elements. The second component of the analysis is 
empirical, involving the use of actual Scholastic Aptitude Test data 
to illustrate and supplement the analytical results. The analytical 
decomposition demonstrates how the effects of item properties, test 
properties, individual examinee responses and rounding rules combine 
to produce the item deletion effect on the equating/scaling function 
and candidate scores, in addition, the analytical component^ the 
report examines the effects of not scoring vss, scoring all options 
correct and the effects of *re-equa±ing vs. not re-equating, as well 
as the interaction between the decision to re-equate or to not 
re-equate and the scoring option choben for the flawed item. m 
(Author/PN) 
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ABSTRACT 



The purpose of this report is to present a formal analysis of the 
effects of item deletion on equating/scaling functions and reported 
score distributions. The phrase "item deletion" shall be used to refer 
to the process of changing the original key of a flawed item to either 
all options correct, including omits, or to no options correct, i.e., 
not scoring the flawed item. There are two aspects to the present 
analysis. The first aspect is analytical, focusing on the development 
of a formal model for the i,tem deletion effect by decomposing it into 
its constituent elements. The second component of the analysis is 
empirical, involving the use of actual data to illustrate and supplement 
the analytical results. The analytical decomposition demonstrates how 
the effects of item properties, test properties, individual examinee 
responses and rounding rules combine to produce the item deletion effect 
on the equating/scaling function and candidate scores. In addition to 
demonstrating how the deleted item's psychometric properties can affect 
the equating function, the analytical component of the report examines 
the effects of not scoring vs. scoring all options correct and the 
effects of re-equating vs. not re-equating, as well as the interaction 
* between the decision to re-equate or to not re-equate and the scoring 

option chosen for the flawed item. The empirical portion of the V ?^port 
uses data from the May 1982 administration of the SAT, which contained 
the circles item, to illustrate the effects of item deletion on reported 
score distributions and equating functions. The empirical data verify 
what the analytical decomposition predicts. 
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EFFECTS ON SCORE DISTRIBUTIONS OF DELETING AN UNKEYABLE ITEM FROM A TEST 

Within the past few years, the pyramid problem on a 1930 form of the 
PSAT/NMSQT and the adjacent circles Item on a May x 1982 form of th* SAT 
have generated a great amount of press about Items with Indefensible 
keys. Wainer's (1981) large sample analysis of the PSAT pyramid 
problem, cleverly' entitled t "Pyramid Power: Searching for an Error in 
Test Scoring with 830,000 Helpers", demonstrated that even a statistical 
analysis based on almost 830,000 examinees would not have revealed that 
the pyramid problem was miskeyed. It appears safe to anticipate a 
similar conclusion would be reached if a large sample analysis were 
performed on the adjacent circles problem. In addition to the highly 
visible external tempest created by these Items, there has been a less 
visible yet vibrant discussion about what to do about defective items. 

One of the available policy options Is to delete the item from the 
test. This option was employed with the SAT adjacent circles item. 
Item deletion is operational Ized by not scoring the item and effectively 
reducing the test length by a single Item. Petersen (Note 1) summarized 
the effects of not scoring the adjacent circles Item on equating and 
reported scores. 

Very recently, two problem items appeared on a Biology Achievement 
test that was administered in June 1982. Consideration was given to 
either giving everyone credit or not scoring the two items under 
conditions of re-equating vs. not re-equating. When re-equating is 
employed, there is no psychometric difference between scaled scores 
based on giving everyone credit on the problem items and scaled scores 
based on not scoring the problem items. There is, however, a very 
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noticeable difference between giving everyone credit and not scoring, 
however, when re-equating is not employed. Petersen (Note 2, Note 3) 
summarized the effects of re-equating vs. not re-equating under various 
scoring options for the two problem items on the Biology Achievement 
Test. 

The purpose of this report is to present a formal analysis of the 
effects df item deletion on equating/scaling functions and reported 
scorcdistributions. The phrase "item deletion" shall be used to refer 
to the process of changing the original key of a flawed item to either 
all options correct, including omits, or to no options correct,' i.e., 
not scoring the flawed item. Although neither not scoring the item nor 
scoring the item all options correct involve deletion of the item in a 
physical sense from the test booklet, the flawed item is, in both cases, 
deleted psychometrical ly from the test scores that candidates receive. 
A psychometrically deleted, item has no psychometric impact on scores 
that individuals receive on) the test. In other words, regardless of 
whether the item is difficult or easy, discriminating or not, it has the 
same impact on all candidates scores: If the item is not scored, all 

* t 4 

candidates receive no points for that item; If the item is scored all 
options correct, everyone receives one raw score point regardless of how 
they responded to the flawed item. As the title implies, this report is 
limited tp item deletion in the sense just indicated. The effects of 
multiple keying of an item are not studied. 

There are two aspects to the present analysis. The first aspect is 
analytical, focusing on the development of a formal' model for the item 



deletion effect by decomposing it into its constituent elements^ 
Portions of this development are mathematically complex. For the 
benefit af the general reader, the salieYit features of this development 



analysis is empirical, involving the use of actual data to illustrate 
and supplement the analytical results. This empirical component, which 
is less mathematically demanding to read than the analytical component, 
is summarized in the last two paragraphs of this introduction. 

The analytical decomposition demonstrates how the effects of item 
properties, test properties, individual examinee responses and rounding 
rules combine to produce the item deletion effect on the equating/ 



scaling function and candidate scores. Prior to decomposing the item 
deletion effects, the fundamentals of item response theory (IRT) 
true-scoring equating are described with the focus placed on the 
compositional nature of the IRT true-*scorc equating process. In short, 
the equating process is composed of various new form and old form 
components. Item deletion affects the equating/scaling process through 
its effects on the new form components. The psychometric 
characteristics of items and rounding rules for formula (or raw) and 
scaled scares contribute to changes in the equating/scaling function. 
An Item's difficulty determines where the change in equating function 
occurs along the raw (or formula) score scale. An item' 8 discriminating 
power determines the abruptness and direction of the change. The item's 
suseptibtlity to guessing moderates the effects induced by the item's 
difficulty and discrimination. Rounding rules can exaggerate small 
effects in a rather unpredictable way. 




next few paragraphs. The second component of the 



In addition to demonstrating how the deleted item's psychometric 
properties can affect the equating function, the analytical component of 
the report examines the ef f ects, of not scoring vs. scoring all options 
correct and the effects of re-equating vs. not re-equating, as well as 
the interaction between the decision to re-equate or to not re-equate 
and the scoring option chosen for the flawed item. Not scoring the 
flawed item and scoring it all options correct affect the equating 
function in opposite ways. While the item's psychometric properties 
determine where the effect occurs, not scoring the flawed item results 
in a shorter test that is harder than the original test, while scoring 
all options correct results in a test that is as long as but easier than 
the original test. The flawed item's difficulty and the scoring 
decision determine how much the new conversion approximates the original 
for a given formula score. For example, while deleting a very difficult 
item may have no substantial effect on the equating function when the 
item is not scored, scoring that same difficult item all options correct 
can have a very noticeable impact on the equating function. 

The analytical decomposition also examines the issue of re-equating 
vs. not re-equating. Not re-equating allows the flawed item's 
psychometric properties to have a substantial impact on scaled scores 
and also allows the decision to score all options correct vs. not score 
to impact on final reported scores. In contrast, re-equating makes the 
scoring decision irrelevant and mitigates the impact of the flawed 
item's psychometric properties on reported scores. Hence, re-equating 

after deletion of a flawed item is clearly better from a psychometric 

j 

point of view. / ^ 

/ 
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The empirical portion of the report uses data from the May 1982 
administration of the SAT, which contained the circles item, to 



and equating functions. Six items from that test, in addition to the 
circles item, were selected for item deletion. 

The effects of item deletion were studied in the following manner. 
Each of the six items was deleted from the 60-item total test containing 
the circles item and the 59-item test that excluded the circles item. 
As a consequence, six separate 59-item tests and six separate 58-item 
tests were simulated. Each of these 12 tests will be compared to the 
full 60-item test. One type of comparison focuses on equating/scaling 
functions and differences induced by deletion of items with certain 
properties. Effects on both rounded and unrounded converted scores will 
be assessed. Difference plots are used to examine these effects. In 
addition, effects of item deletion on examinee formula scores are 
assessed. This step necessitated rescoririg for a representative sample 
of 45,579 examinees, the same set of examinees used to assess the effect 
of deleting the circles item. Differences among rounded and unrounded 
scaled scores are summarized. <■ 

The empirical data verify what the" analytical decomposition 
predicts. For example, item difficulty determines where the change in 
equating functions occur and by how much reported scores produced by not 
re-equating under each scoring option (not score vs. score all options 
correct)*dif f ers from those produced by re-equating under either scoring 
option. The illustrative data demonstrates that re-equating mitigates 



illustrate the effects of item deletion on reported score distributions 
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the impact of the deleted item's psychometric properties on reported 
scores. In fact, re-equating reduces the impact of the flawed item's 
Characteristics on reported scores to an effect that is smaller than 
that associated with the rounding of scaled scores to two significant 
digits. In contrast, not re-equating enables the item's properties and 
the scoring decision to have very noticeable impacts on reported scare 
distributions. In short, the illustrative data vividly demonstrates the 
importance of re-equating , and given re-equating the relative 

<v 

uniirfportanc£ of flawed items psychometric characteristics. In the 
process, the relative importance of rounding rules is also illustrated. 

► 
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Analytical Decomposition 

The effector of deleting an unkeyable item can be accounted for by 
three components: 

• Changes in the equating function that maps rounded formula scores 
on the new form onto formula scores on the old form. 

• Changes In the formula scores of individual examinees. 

• The rounding of scaled scores to their two-significant-digit 
reported -score form. 

The effects of these three components will be addressed in sequence. 

Changes In t h e Equating Function * - > 

This component of the item deletion effect is affected by the 
psychometric properties of the item. The particular effect depends on 
which equating method is employed, e.g., IRT true-score equating, linear 
equating, or equipercent 1 le equating. Here, we focus on IRT true-score 
equating, the method employed for the SAT and th6 PSAT/NMSQT. 

IRT true score equating. Lord (1980, pg. 198) demonstrates that 
observed scores on two tents cannot satisfy certain equating 
requirements unless either (1) both scores are perfectly reliable or (2) 
the two tests are strictly parallel, in which case equating is 
unnecessary. Since perfect reliability is virtually unattainable, 
observed-score equating Is either unnecessary or impossible. 
Consequently, Lord advocates true-score equating. 

Lord (1980, p. 199) cites three important requirements for equating 

y 

two unidimens ional tests that measure the same ability: 

1. Equity: For every ability level, the conditional frequency 
distribution of equated acores from tfest X for a given ability level 
should equal the conditional frequency distribution of equated scores 
from teat Y at that same ability level. 

12 
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2. Invariance aoross groups: The equating function should be the 
same regardless of the, population from which it was determined* 

3* Symmetry: The equating relationship should be the same 
regardless of whether X is equated to Y or Y- is equated to X. 

fRT true-score equating meets these three requirements because true * 
scores on tests measuring the same ability are perfectly related, i.e., 
there is an exact unique functional relationship between the true scores 
on the- two tests. The equity condition is met because IRT true-score 
equating depends solely on IRT item parameters which theoretically are 
invariant across populations of ex^iinees. Finally, symmetry follows 
from the identity relationship, / 

To appreciate the mechanics of IRT equating, we need to introduce 

* ,r « 

some mathematical concepts ' and notation. We begin with the intern 
response function, P (0). The item response function is a mathematical 

g _ 4) * , 

expression for describing the probability of success on an item as- a 
function of -a single characteristic of th§ individual answering the .-. 
item, his or her ability, and multiple characteristic of the item. The 
IRT model used for the SAT and the PSAT/NMSQT is the three-parameter 
logistic, , 



(1) P (6) - c * (1-c ) [l+e^^V^Vl -1 • 

o o o 

a 1 ■ 

where: / % 

P (6 ) - the probability that an examinee with ability 6 answer^ item 
8 'g correctly; * f 

a g - item discrimination parameter for item g; 

b * *- item difficulty parameter for Item g; 

8 > 
'c - lower asymptote of the item response curve, the probability 
* that an examinee with extremely low ability answers item g 

correctly. 
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In (1), 0 is' the; ability parameter, a characteristic of the examinee, 

and a t b and c are the item parameters that determine the shape of 
8 6 8 * 

the item response function. ^ ' 

For a test composed of n items, summing the item response functions 
over the h items yields, the test characteristic function 

* * n * « 

\2) - R - Z P (0) . 

g«l 8 

The test characteristic function identifies the expected number-right 
score for each level of 0. This expected number right score is the 

number-right; true score on that test. 

i ■ 

If test X and test Y are measures of the same ability 0, then their 

f 

number-right true scores are/related to 0 by their test characteristic 
functions 



n m 
(3) R - P.(0) ; R * Z P,(0) . 

i-1 . y j-1 3 



Note that R^ and R y are functionally related to each other through their 
relationships with 0. Substituting values of 0 into R^ and R y in (3) 
yields pairs of X and Y true scores. These <pairs of true scores define 
R * as a function of R and vice versa, and constitute an equating of 

x y 

true scores. 

Let t and t refer to the test character is tic function trans- 
x y 

formations that convert 0 to R and R , respectively, i.e., 

x y 

(4) R x - t x (0) ; R y « t y (0) . 

Then, we can express 0 as a function of R and R via 

* y 
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cs) . t " l (R ) - e - t ~* (R ) • 

Let us designate X as the old form and Y as. the new form. To find the 
transformation that equates Y to X f we first find the 0 corresponding to 
a particular number-right true score on Y via 



(6) " , e - ty-^Ry) . 

K 

Next, we' find the number-right true score on X corresponding to that 6 
via 



(7) R » t (6) - t (t ^(R )) . 

v ' x x ' x y y 



Substituting a value of R y into (7) yields its equivalent in R x metric. 

Both the SAT and the PSAT/NMSQT ajre formula-scpred tests.. In lRT f 
true formula scores on X and Y are defined via 



n n 
a a 



(8) 



and 



(9) 



fs (e ) * z p 4 (e ) - z [(i-P'(e ))/(a -l)] 

x a i-1 1 < a i-1 a 1 



m m 

a 



FS (6 ) - Z P.(6 ) - Z [(l-P.O ))/(A -1)] 



where n and m are the number of items on X and Y that were reached by 

a a - 



examinee a f and and A^ pre the number of response alternatives on 
items i and j f respectively. When an examinee reaches al 
all items have A options, (8) and (9) simplify to 



ill items, aqxl 



(10) FS x - <AR x -n)/(A-l) ; FS y « (AR y -m)/ (A-l) . 
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For simplicity of exposition, we will assume all examinees reach every 
item, all items have the same number of options A, and f and f 

* y 

represent the. trans formations in (10), i.e., 



(id 



FS - f (R ) ; FS - f (R ) 

x xx- y y y 



Rearrangement of terms in (10) yields, 



(12) 



K - ((A-l)FS +n)/A ; R - ((A-l)FS +m)/A , 
x x y y 



which can be expressed as 



(13) R - f -1 (FS ) ; R - f -1 (FS ) . 

v 9 x x x • y y y 

IRT true formula-score equating proceeds as follows: The true 
formula score on new form Y is converted to a number right true score on 
Y via 



(14) 



R ~ f -1 (FS ) . 

* y y y 



Then, (6) is used to convert R to 9, and (7) converts 6 to R . yielding 

y , x 



(15) 



Next R is converted to FS via (11), yielding 

y ^ 



(16) 



FS - f (R ) - f (t (t -1 (f " l (FS )))) . 
y x x x x y y y 



Equation (16) expresses the equating of true formula scores on Y to true 

formula scores on X. In practice, the transformation from FS^ to 

t 

scaled score is applied to the equated scores in (16) to place the Y 
test on scale, i.e., 
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(17) SS y - « x (f x (t x (t y ** l (f y " l (PS y ))))) " y FS y > • 
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In sum, the scaling function for formula scores on new test y is 

(1%) s "8 of ot ot ^ o f ^ . 

{ y x x x y y 

where o Indicates composition of functions. 

Table 1 contains a conven±fc*\t summary of the various functions 

involved in IRT equating and the scores they operate upon. In this 

table, the four types of scores, ability estimate (0), number right 

V 

score (R , R ), formula score (FS , FS ) , and scaled score (SS , SS ) 
x y x y x y 

for old form X and new form y are defined at the bottom. Above these 

score designations is a list of the functions. Alongside each function 

#■ " 

is a description of the mapping accomplished by that function and the 
number of the first equation co^itaj^4ng that function. For example, t^ 
maps 8 onto number-right true score on test X, while 8^ maps formula 
scores on test Y, ^S^, ov ^° t * ie re P ortec ^ score scale for that test, SS^. 

The effect oj item deletion. Equation (18) shows that the scaling 
function for test Y is a composite of several functions. Two of these 
functions deal with the relationships between test Y items and ability 
0, two deal with the relationships between test X items and ability 6, 
and the fifth is the scaling function for test X. When an item is 
deleted from test Y, three of these functions are unaffected, namely, 
the functions associated with the old test X. ^The functions relating 
test Y true scores to 6 are af fee ted fStf item deletion. Hence, we shall 
focus on the effects of these functions. In this section and the 
following section, we presume that de/letion of the item Is accomplished 
by a decision not to score the deleted item. After the development for 
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Table 1 



Functions and Scores Employed in 1RT Equating 



1 



Function maps Score onto Score Equation # 



-1 



-1 



e 
e 



A 
A 
5 
5 



-1 



-1 



y 

FS 
FS 



FS 



FS 



11 
11 
13 
13 



FS 



FS 



SS 



SS 



18 
17 



Ability 
Number Right 
Formula Score 
Scaled Score 



Old Form 



x 
» 

FS 



New Form 



R 



1 



SS 



y 

FS 

5 

SS 



L A parallel set of— scores and related functions exist for shortened 
tests. 
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deleting and not scoring, the alternative of "deleting" and giving 
credit to alloptions, including omits, will be considered. 

The first function to be considered is f ~* in (14). Let f , 1 

\ 

represent the forpula scor^ to number-right ^true-score conversion for 

i 

the reduc*4^test Y' composed of the m-1 items that remain after deletion 
of item k. Likewise, t^, represents the relation between number right 
true score and 6 on test Y' . Hence, the scaling function for Y' is 



(19) s , - © of o t o t f -1 o f ,~ 1 

y x x x y y 



The function t . is defined by 

y 



m 



(20) ' R , - I P (e) . 



Note that R , can bfc related to R for the full test via » 

y y 

(21) R , - R - Pr. (6) . 

y y k 

Hence, t and t , differ by the Item characteristic function for the 

y y 

deleted item k. Note that for all 9, R > R ,. Since the item 

y - y 

characteristic function for any item is a function of three item 
parameters, the particular change from t^ to t^, is a function of these 
parameters. Hence, the deleted item's psychometric properties, embodied 
in the item discrimination parameter (a^) , the item difficulty parameter 
(b^) , and the lower asymptote (c^) , affect the equating function through 
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their affect on t . Deletion of the Item affects t in\ another 

y y 

important way, namely, it constricts tjie rartge of the function. Whereas' 
t maps 9 onto a scale bounded by 0 and m, C yi» maps 9 onto a scale 
bounded by 0 and m-1. This restriction in the range of the function 
occurs regardless of the deleted item's psychometric properties and may 



be the major contributing factor to differences between t and t yf . * 
The function f , 1 is embodied in 

y 

(22) R y - ((A-l)FS yl ♦ (m-l))/A . 

Given the relation in (21), FS^ pan be related to FS yl via 
(FS ( (A-1) ♦ (m-l))/A - R - P (8) 

y y *■ 

FS ,(A-1) - AR - AP (6) - (m-1) 

(23) y y 

FS y , - (AR y -m)/(A-l) ♦ ( l-AR fc (e) ) / (A-l) 

FS , - FS ♦ (a-AP. (6))/(A-l) . 

y y k 

In (23), it is clear that true formula score is affected by the 
properties of the deleted it^u Note that FS yf is greater than FS y for 
values of 9 for which p k ( e ) is less than 1/A. This is an interesting 
result because it states that the expected formula score for individuals 
of very low ability can increase when an item is deleted despite the 
fact that the test is shortened by one item. For very low level 
examinees, the maximum increase is (1-Ac fc )/(A-1) . Since the minimum c^ 
value is zero, the maximum gain in expected formula score is 1/(A-1), 
which for a five-choice item is .25. At the other extreme, very high 
ability individuals exhibit decreases in expected formula-score of 
(1-A)/(A-1), which is -1, precisely the decrease in expected 
number-right score at that level of ability. 



' 20 
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Since P k (9) is always greater than zero, it can be inferred from 
(21) that for all 9, R y > R yf . This inequality reflects that it is 
always easier to obtain a given number right on Yon than Y f regardless 
of which item is deleted. How much higher the expected score on Y is 
than the expected score on Y 1 depends on the properties of the deleted 
item and the individual's ability level. Since R ± R , for the same 9, 

♦ y y 

it follows that a given number-right score from Y f will convert to a 
larger (or as large) 9-value than will the same number-right score from 
Y. In short, the longer test Y appears easier thanthe shorter test Y f 

because of the inequality R > R ,, i.e., at all values of number, right 

~ (/ 1 1 

score, the function t^ f 1 will exceed or equal t^jf* , i.e., t;^, — c y 

The same effect tends to occur for formula-scopes as well. In 

particular, • ^ 

>. 

(24) FS > FS f for all P, (9) > 1/A. 

N ' y — y T k — 

Since ? k (&) should exceed 1/A for most values of 6 , it follows that for 
almost al^3lue8 of 9, a given formula score oiyY 1 will convert to a 
larger (or as largeO 9-value than the same formula score on Y, i.e., 
(t o f < (t j * o f ,"*) for a given formula score. When c, >1/A, 

y y - y\ y k ~~ 

this inequality will rrcltivf or all vaWea of 6. In short, the longer 
test Y will appear easier than\he shorter test Y' for most if not all 
values of 9. How much easier and for what values of 9 will depend on 
the psychometric properties of the deleted item. 

Since, t < t t ~* for all number right scores, and f 1 < f , 

y — y * y — y 

will be true for most formula scores^ the- scaling function for Y f will 
tend to .be higher than that of Y for most, formula scores, i.e., i y , >. s y 
for most formula scores. 

4l 
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Effects of deletion on scaled scores . The relationship between b^, 
and 8^ can be constrained by the effects of the psychometric properties 



L 

general relationship can be expressed as: 



of the deleted item on the relationship between FS^ and FS^ f . The y 



• SS f > SS for all P,(8) > 1/A, and 

y — y k — 

SS . < SS f for all P, (6) < 1/A . 

Note that the point of which ^(8) equals 1/A depends on a^ t b^» and c^. 
Examination of specific cases of this general relationship proves 
enlightening. 

• If c, • 1/A and a, - 0, then SS f - SS 

it k y y 

for all formula scorea. 

This unrealistic item is characterized by a flat item characteristic 
curve of height 1/A, Such a curve would be observed if all examinees 
responded randomly to the item. 

• If c, > 1/A, then SS . > SS for formula scores. 

k y y 

Whenever the lower asymptote exceeds the chance level, the shortened 
test is harder than the longer test at all levels of 8*. 



• If ■ 1/A, and a fc ia extremely large, then 

FS t - FS for all 8V < b, , and 

y T y v., k 

FS f - FS -1 for all 8 > b, ' 
y . y - k' 

Hence, SS . - SS for all 0 < b, , 
y y k 



and SS t " > SS for all 8 > b, . 

y - y - k 
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This item has a lower asymptote at the chance level and exhibits a very 
steep climb from - 1/A to ■ 1 at 8 ■ b fc . In short, it is a highly 
discriminating item on which examinees either know the answer or guess 
randomly. While deletion of the item has no effect for those below 
8 - b^» the shorter test is harder for those whose 9 b^. Hence, 
SS t > SS For this latter group. 

y - y 



• If ■ 0 and a^ is extremely large, then 

FS , - FS + 1/(A-1) for all 0< < b,, and 
y y k 

FS , - FS - 1 for all 0 > b. . 

y y - k 

Hence, SS . < SS for all 0 < b. . 

y y , k 

SS , - SS at 6 ■ b. , and 

y y k 

SS . > SS at 0 > b. . 

This is a sharply discriminating item that clearly separates * those who 
know it from those who dq not. deletion of this item from the test has 
an interesting effect^. The shortened test is harder for those with 0 
above b^, and easier for those with 0 below b^. This example 
illustrates a general result that follows from the general relationship 
stated earlier: Deletion of an item makes the shortened test easier for 
examinees who perforin below chance level on that item. For all others, 
the test is either as hard or harder than before. 
C hanges in Individual Examinee Scores 

When an item is deleted from a test, the formula score of an 
individual examinee may or may not change. Wainer (Note 4) referred to 
the score on a test that an individual will have when a particular item 
is deleted from the test as the item's influence function (IIF) . In the 
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formula score metric, each and every item has three item influence 
functions, one for each possible score on the item (omit, correct, or 
incorrect). These three distinct functions are the same across all 
items. In the formula score metric, these three IIFs are: 

(26) IIF (FS y , Y fc - 1) - -1.0 
IIF (FS , Y, - 0) - 0.0 

IIF (FS y , Y k - -1/(^-1)) - 1/(^-1). 

Note that the three functions are independent of the examinee's ability 
and the item's psychometric properties. They depend solely on the 
examinee's response to the deleted item and the number of response 
alternatives • 

The impact of rounding rules. When expressed in the reported scaled 
score metric, however, the three item influence functions do depend on 
examinee ability and the item's psychometric properties. In addition, 
rounding conventions for both formula scores and scaled scores impact on 
these influence functions, a point overlooked by Wainer (Note 4) in his 
treatment of item deletion. For a correct response to the deleted item, 
the influence function is 

(27) IFF(SS yl Y k -1) » r 88 (8 yl r f8 (FS y - 1 ))) - (s y (r -8 (FS y ) ) ) 

where in (27), r and r f refer to the ETS rounding rules for reported 

88 it 

scaled scores and formula scores, respectively. For both the SAT and 
the PSAT/NMSQT, reported scaled scores are rounded to two significant 
digits, e.g., on the SAT, r (444.97) - 440, while r s|| (445.01) - 450. 
In addition, formula scores are rounded to integers, e.g., (15.75) - 
16, while r fg (13.25) - 13. In (27), r fg (FS^ 1 ) - r f§ (FS y ) - 1. 
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For an omit, the influence function is 

(28) IIF(SS , Y =0) - r (s , (r f (FS ) - r (s (r f (FS ))), 

y It ss y fs y ss y fs y 

because FS , ■ FS , 

y y 

For an incorrect response, there are two possible item influence 

functions. If r (FS .) « r' (FS ). then equation 28 is the item 
ss y ss y * 

influence function for an incorrect response. If, however, 

r (FS ,) = r (FS ) + -1, then the item influence function is described 
ss y ss y 

by • 

(29) .IIF(SS y . Y k -l/( V l))=r 88 (8 y ,(r f8 (FS y ) + l) - (s y (r ft (FS y ) ) ) . 

In (27)-(29)', note that the item influence function depends on an item 
score component, namely the rounding conventions for scaled scores and/-*^ 
formula scores (r and r f ) and the individual's response to the 

S S 1 8 ^ 

deleted item (Y ) , as well as an equating function component (s and 

k y 

By,), which is affected by the item's psychometric properties and the 
examinee's particular ability level. In (27)-(29), note that the 
arbitrary rounding rules can have a noticeable impact. 
Parallel Analyses for "Deleting" Item and Giving Credit to All Options 

The preceding analyses presume that the deleted item Is also not 
scored. As an alternative to not scoring the deleted Item, one can 
consider giving credit to all candidates for all options Including omit. 
There are pros and cons associated with not scoring vs. giving credit to 
all options. From a psychometric viewpoint, it makes no difference as 
long as the new test is re-equated. From * public relations viewpoint, 
however, it makes a difference. On the surface, giving credit appears 
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more palatable than not scoring * problem item because no raw formula 
scores go down, i.e., those who gave the keyed response keep the same 
formula score, while everyone £lse gtets a higher formula score. In 
contrast, not scoring the deleted item reduces the formula score of 
those who gave the keyed response and increases th£ scores of some of 

\ - 

those who answered the Item incorrectly. Re-equattog yields the same 

\ 

scaled scores for each candidate regradlesA of which scoring option is 
used with the deleted item. As a consequence of re-scoring and 
re-equating, the scaled scores for those candidates ^who originally "got 
the deleted item right" will either go down or stay the same. In 
contrast, those who originally "got the item wrong" will retain the same 
scaled scores or obtain higher ones. * 

Since re-equating makes the scoring option (not score vs. all 
options correct) irrelevant, the choice between not scoring and giving 
everyone credit should be based on public ^lations considerations, 
which could differ depending on whether the item has a defensible key. 
In ray opinion, when there is no defensible key, as was the case with the 
circles item, it is easier to explain a lower scaled score to a 
candidate when not scoring than when scoring all options correct. When 
not scoring the deleted item you can tell the candidate: "your score 
went down because you had •correctly 1 answered an Item which has been 
dropped from the test because it had no correct key; consequently your 
new raw score is one point lower than it was. In contrast, when scoring 
all options correct, you might have to say: "while your raw score was 
unchanged, everyone was given credit on the item, as a consequence the 
test became easier than it was and your unchanged raw score led to lower 
scaled score." When there are several defensible keys, however, a 
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stronger argument for scoring all options correct exists, as was the 

♦ 

case with the problem Biology items. 

Scoring all options correct does affect \he equating function 
differently than the decision to not* score the item. A parallel 
analysis of equating' functions and item influence functions can be 
conducted for the decision to score all options correct. Rather than 
repeat the analyses of the two preceding sections, I will summarise how 
*they would differ from the analysis for not scoring the item. 

The item influence functions for scoring all options correct 

would be 



(30) ' IIF(FS 



, Y,*!^ - -1.0 + 1,0- 0.1 



IIF(FS , Y=0) * 0.0 + IrO - 1.0 
y k 

,IIF(FS yf Y fc — 1/(^-1) - 1/(^-1) + 1.0, 
* * 

i.e., one point higher than the IIFs in (26) for not scoring the item. 

■ * <L 

When the Item was deleted land not scored, the new tekt Y f was harder 
than the original #est Y. ^When the item is scored all Options correct, 
m however, the new test is^easier than the original tedt. - This impacts on 

the equating function analysis. The ultimate eiffect is that the 
r _equating function for, the aew test is always lower than the equating 

\ 

function for the original test, i.e., it is easier to obtain a 
particular formula scoreNjji^the new test than it was on the original 
test. Hence, for any given formula score, the scaled score on' the 
original test exceeds that .of the new test. The item t s> psychometric 
properties determine by. how much th^se conversions differ, *as will be 
illustrated in the empirical section of this report. * , , ' 
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Summary of Analytical Decomposition 

The effects of item properties, test prpp^i^* individual exafline^ 
responses and rounding rules combine to produce £he i tteto Itif lu^ce / 
functions in the reported score me t r ic described 'ear li^t> It* the 
section on the effects on the equating 'futict^on, the eqqa t inj^/ seal ing 
function was decomposed into its ol4 form ar*£ tie^ foirtu components as 

depicted in (18). Then, it was $hfewnA that; j/only the new form components 

1 ■ ; v " ' . ' ■■""I"; ' 

were affected by item; deletion. Filially, ttiffif impaqjt of various 

psychometric properties, such as difficulty and di^f liiiination,, on these 

new form transformations was discussed ^hd su^ttatiajed °£n the section 

entitled the effects of item parameters on equating functions. 

Next, the effects of individual examinee's responses to the deleted 

Item were examined, and the item influence function Vas introduced. In 

the formula score metric, there are thrfee item influence functions, one 

for each possible item score, that are independent of ability and the 

same across all items with the same number of response alternatives. In 

the reported score metric, however, item influence functions were shown 

to depend on examinee ability and changes in the equating function as 

well. In addition, the impact of rounding rules was noted. In sum, we 

have decomposed the item deletion effect into its various components. 

As a consequence, for any given examinee, we can project their new 

reported score frfem their original formula score, their response to the 

deleted item, and the test characteristic function for the original 

test. Hence, we can project the individual effects fpr all examinees. 

.These individual effects culminate into effects on reported score 

distributions. The ultimate effect on reported score distributions 

depends on the ability distribution in the population of interest and 
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the responses of members of that population to the deleted Item. The 
particular nature of these effects will vary from setting to setting. 
These points are Illustrated in the next section with data from the May 
1982 administration of the SAT. 
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Illustrations of Item Deletion Effects 
# 

f 

i Since the circles item appeared on the May 1982 administration of 
the SAT, data from this administration will be used to illustrate the 
effects of item deletion on reported -score distributions apd equating 
functions. These data provide answers to several interesting 'what if 1 
questions. We can examine the effects associated with deleting items 
that have certain psychometric characteristics, enabling us to answer 
what would have occurred if the/circles item had different psychometric 
characteristics than it had. In particular, six items vyftee selected for 
deletion. Item MD was the most difficult item on the test (b__ - 2.87); 
item LD was. the least difficult (b^ ■ -2.48). In addition to these two 



rrminating it 



extremes on the difficulty continuum, a highly discriminating item 

(a^, ■ 1.73) with a high lower asymptote (c^ ■ .27), a highly 

discriminating item ( a ^ c m 1*48) with a low lower asymptote (c^- - .05), 

a poorly discriminating item (a «■ .55) with a high lower asymptote 

au 

(c r ■ .27), and a poorly discriminating item (a - .52) with a low 
at* ac 

lower asymptote (c flc ■ .03) were selected for deletion. These four 



>ectively . 



items are denoted by AC, Ac% aC, and ac, respectively. Table 2 contains 
the item parameters for the six items and the circles items. 
Item Deletion Simulation Procedures 



The effects of item deletiorTTSTere studied in the following manneT. 
Each of the six items was deleted from the 60-item total test containing 
the circles item and the 59-item test that excluded the circles item. 
As a consequence, six separate 59-item tests and six separate 58-item 
tests were simulated. Each of these 12 tests will be compared to the 
full 60-item test. 
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Table 2 ^ 
Item Parameter Estimates for Deleted Items 



Item 


a (discrimination) 


b(difficulty) 


c (lower asymptote) 


MD 


.83.. 


2.88 


•15 


LD 


.53 


-2.48 


^.08 


AC 


1.73 


.94 


.27 


Ac 


1.48 


2.16 


.05 


aC 


.55 


1.61 


•27 


ac 


.52 


.03 


.03 



Circles 1.30 1.10^ .24 



\ 
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One type of comparison focuses on equating/scaling functions and 
differences induced by deletion of itenls with certain properties. 
Effects on both rounded and unrounded converted scores will be assessed. 
Difference plots are used to examine these effects. 

In addition, effecta of item deletion on examinee formula scores are 
assessed. This step necessitated rescoring for a representative sample 
of 45,579 examinees , the same set of examinees used to assess the effect 
of deleting the circles item. Differences among rounded and unrounded 
scaled scores are summarized. v 
Results 

Equating/scaling functions. Figure 1 depicts the effects on the 
equating/scaling functions produced by deleting the most difficult item 

by itself (indicated by the 00000 for rounded scores and the for 

unrounded scores), and with the circles item ( indicated by +++++ for 
the rounded scores and for the unrounded scores). In this figure, 

and all subsequent figures, the rounded differences are discrete, taking 
on one of the five possible values: -20, -10, 0, +10, +20. These 
rounded differences are obtained by subtracting, for a given formula 
score, the re-equated rounded scaled score for the 59-item test (or the 
58-item test) produced by deleting the most difficult item by itself (or 
by deletion of that item and the circles item as veil) from the rounded 
scaled score for the original 60-item test. In contrast, the 
"unrounded" differences are differences of unrounded scaled score * 
conversions that are rounded to the units place. 

This figure 4nd subsequent figures contain an upper and lower panel. 
Both panels contain four difference plots, two rounded and two 
unrounded, two for deleting a single item (in Figure 1, item MD) and two 
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Differences in Unrounded and Rounded Equating/Scaling Functions 
for the 59-itera (and 58-item) Tests Produced by Deletion of the 
Most Difficult (MD) Item (and the Circles Item) 
(Original Equating - Re-equating) 



MOST DIFFICULT ITEM SC0RE0 ALL OPTIONS CORRECT. 
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for deleting that item plus the circles item. In the upper panjtl, the 
differences between the original conversions and those obtained by 
scoring all options correct and re-equating are plotted. The lover 
panel contains the differences between the original conversions and 
those obtained by not scoring and re-equating. / 

Examination qf the bottom panel of Figure 1 reveals tfiat, at the 

unrounded scaled score level, deletion of the most difficult item has a 

t 

i 

negligible effect on the equating function for not scoring the item at . 
^formula scores less than 50. In fact, the only unrounded difference 
Which exceeds -5.0 in magnitude occurs^ at a formula' score of 59. There 
is hardly any effect evident on rounded scaled score level either. The 
rounded differences of +10 that occur for the three formula scores below 
50 are actually negligible unrounded differences: 294.9656 vs. 295.5527 
at a formula score of 4; 384.6948 vs. 385.2297 at a formula score of 14; 
514.9980 vs. 515.3547 at a formula score of 29. The -10 at a formula 
score of 41 Is also the result of a negligible difference, 625.1816 vs- 
624.9834. In short, deletion of the most difficult item has very little 
effect on the equating function for not scoring the item because the 
shortened test is almost as easy as the longer test. 

When the circles item is also deleted to produce a 58-item test, a 
greater effect occurs. Note, in the bottom panel, that for formula 
scores of 41 and greater, the rounded conversions for the 58-item test 
exceed those of the full 60-item test at all but one formula score, 49, 
where rounding produces an equal scaled score of 700 because the 60-item 
conversion is 695.2595 while the 58-item conversion is 703.8240. The 
plot of the unrounded conversion for the 58-item teat, denoted by 
indicates a noticeable downward slope that begins at about a formula 
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score of 30. At a formula score of 40, the difference in unrounded 
conversions begins to systematically exceed -5.0 lfel£TWgnitude. By 53, 
this difference exceeds -10 in magnitude. Clearly th^additional*^ 
deletion of the circles item has a much greater ef feet jot^ the equating 
function for not scoring than did deletion of just the mos^^dif f icult 
item. 

The upper panel of Figure 1 provides a sharp contrast to the lower 
panel of the same figure. Here the differences associated with scoring 
the item all options correct are depicted. At every formula score, the 
original conversions exceed the new conversions that result from 
re-equating. Obviously, deletion of the most difficult item and scoring 
all options correct has a substantial effect on the equating function. 
In short, the decision to score a difficult item all options correct 
makes the new test noticeably easier than the* old test. Deletion of the 
circles item as well merely increases the differential in difficulty. 

The contrast between the panels in Figure 1 can illuminate 
discussion about the necessity to re-equate tests after item deletion. 
Recall that it was stated earlier that re-equating makes it 
psychometrically irrelevant whether the deleted item is scored all 
options correct or deleted. Figure 1, as well as all subsequent 
figures, can be used to demonstrate that how one scores the item becomes 
very important when the decision is made not to re-equate the test after 
item deletion. The upper panej^jref lects the differences in scaled 
scores for a given formula score that would be obtained if the original 
conversion were used Instead of the conversion produced by re-equating 
the new test. Except at very low and very high scaled scores, and a few 
points In between where rounding Impacts on the results, use of the 
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original conversion with the adjustment of formula scores resulting from 
scoring the flawed item all options correct would yield scaled scores 
that were 10 to 20 points higher than what the re-equating would suggest 
are appropriate. In short, there would be a systematic positive bias 
introduced into the scaled scores. 

In contrast, the bottom panel in Figure 1 reveals that use of the 
original conversion with the adjustment of formula scores resulting ^from 
not scoring the flawed iter/' would yield scaled scores that were equal to 
the scaled scores produced by re-equating except at hi^h formula scores 
where the re-equated scores would exceed 'the original cdnj/ersions 
(reflected by negative differences in the bottom panel). In sTTort, the 
decision to not re-equate, if it were made, would" mike the scoring 
decision, not score vs. s<;ore all options correct, very important: 
Giving credit tends to make all scores higher than the re-equated 
scores, while not scoring tends to make all scores lower than the same 
re-equated scores. In addition, not re-equating allows the psychometric 
properties of the deleted items to impact on the nature of these 
differences, as will be seen in subsequent figures. 

Figure 2 provides a striking contrast to Figure 1. Here, the 

effects of deleting the least difficult item are depicted. Note that 
O 

all four difference plots in the upper panel reflect positive 
differences, while all four difference plots in the bottom panel 
indicate negative differences, which implies that ill formula scores 
above 6 are easier to obtain on the 60-item test than on either the 
58-item or 59-item tests under not scoring, and harder on the 60-item 
test than they are under scoring all options correct. In short, all 
four difference curves in the bottom panel are consistent with the fact 
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Figur A e 2 

Differences^ in Unrounded and Rounded Equating/Scaling Functions 
for the 59-item (and 58-itera) Tests Produced by Deletion of the 
Least Difficult (LD) Item (and the Circles Item) 
(Original Equating - Re-equating) 
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that deleting the easiest item providea/a shorter more difficult test, 
while the upper panel indicates that scoring all options correct makes 
the test easier. By a formula score of 2.0, the unrounded conversions 
in the bottom panel for both the 58-iteo and 59-item tests exceed -5.0 
in magnitude/ Note that at approximately a formula score of 36, the 
unrounded difference plot in the bottom panel for the 59-item test has 
leveled off, while that of the 58-item test continues to slope downward. 
This latter effect, separation of the two unrounded difference plots at 
around 30, which also occurs in the upper panel, also occurred in Fig;ure 
1 and reflects deletion of the circles item. 

Examination of Figures 1 and 2 provide insight into what might 
happen if one decided not to score a flawed item and not to re-equate, 
which is unsound from a psychometric vantage point. If the flawed item 
were hard, as is the case in Figure I, the converted scores would agree 
up to formula scores in the high fifties (see lower panel). In fact, 
use of the original conversion would avoid the three roundoff problems 
* at A, 14, and 29 that were discussed earlier. Deletion of the second 
item, however, introduces consistent differences above 41 that would be 
ignored if the original conversion were used. 

While use of the original 60-item conversion on the 59-item test 
resulting from deleting and not scoring the most difficult item might 
not affect scores mudh, deleting and not scoring the easy item is 
another story. Here, use of the original conversion with the 59-item 
and 58-item tests., whose equating function differences are depicted in 
the lower panel of Figure 2, would yield substantially lower converted 
scores than would re-equating. 
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Comparison of the upper panels of Figures 1 and 2 provides us with 
insight into what would happen if the decision to score all options 
correct were accompanied by a decision to not re-equate. As noted in 
the discussion of Figure 1, scoring the most difficult item all options 
correct has a profound impact on the equating function compared to 
delating and not scoring that same item. In contrast, the upper panel 
in Figure 2 reveals that scoring the easiest item all options correct 
only affects low formula scores, those below .6, while the lower panel 
reveals that not scoring the easiest itetp affects almost all scores 
above 6. In short, the decision not to re-equate allows both the flawed 
iteri's psychometric properties and the scoring decision to impact 
significantly on reported scores. When there is no re-equating, scoring 
the most difficult item all options correct produces the largest scaled 
scores, followed by scoring the easiest item all options correct. Also 
under no re-equating, not scoring the easiest item produces the lowest 
scaled scores, followed by deleting and not scoring the hardest item. 
Hence, not re-equating allows the properties of the deleted problem item 
and the scoring decision to interact and impact significantly on 
reported scores. In contrast, re-equating makes the scoring decision 
irrelevant and, as will be seen, mitigates the effects of the deleted 
item's psychometrijc/ffroperties. 

In the four refraining figures, attention will be paid to the lower 
panels only where discussion will focus on the effects of deletion on 
equating and scaling functions. Since re-equating is clearly desirable, 
re-equating vs. not re-equating will not be discussed explicitly with 
these figures. The reader, however, cti| compare the upper and lower 
panels of these subsequent figures to project the tffects of re-equating 
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vs. not re-equating. The re-eqdating vs. not re-equating issue will be 
revisited explicitly when the effects on score distributions are 
addressed. 

Figure 3 depicts the effects of deleting a highly discriminating 

item with a high lower asymptote, item AC. The first thing to note in 

' " /' 

the figure is that all four difference curves in the low«r panel are 

below zero at all formula scores. This follows from the fact that 

c - .27, i.e., the lower asymptote exceeds chance level performance. 
AC 

Hence, the shortened test is harder than the longer test at all formula 

score levels, a point noted earlier in the analytical analysis. Another 

aspect worth noting. is that for formula scores below 20, the unrounded 

^differences are close to zero, a fact that is characteristic of highly 

discriminating items. It should be noted that the circles item was 

highly discriminating also, a £ - 1.30. Observe that the difference 

curves Begin to descend noticeably and level off quickly also, a 

characteristic of highly discriminating items. 

Figure A depicts the effects of deleting another highly 

discriminating item, Ac. Item Ac, however, has a low lower asymptote, . 

c - .05, and a high difficulty, b. - 2.16 to accompany its high 
Ac AC 

a -1.48. As a consequence, the shortened tests are easier than the 
Ac 

60-item test for sizeable portions of the formula* score range: up to a 
formula score of 28 for the 58-item test, and up to a formula score of 
45 for the 59-item test. As in Figure 3, sharp declines and leveling 
off occur in the difference curves. The nine +10 differences observed 
for the rounded 59-item conversion are clearly rounding artifacts. 

The sharp declines and abrupt leveling off observed in the bottom 
panels of the last two figures are not replicated in the next two 
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Figure 3 

Differences in Unrounded and Rounded Equating/Scaling Functions 
for the 59-itera (and 58-item) Tests Produced by Deletion of the 
4 > High A - .High C (AC) Item (and the Circles Item) 
(Original Equating - Re-equating) 
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Figure 4 

Differences in Unrounded and' Rounded Equating/Scaling Functions 
for the 59-ltem (and 58-item) Tests Produced by Deletion of the 
High A -Low C (Ac) Item (and the Circles Item) 
(Original Equating - Re-equating) 
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figures, which depict the effects on equating functions of deleting 
items with low discriminating power. Figure 5 depicts the effects of 
\ dropping the aC item (a flC « .55, b flC - 1.61, ■ .27), while deletion 

of the ac item (a - .52, b « .03, c « .03) is depicted in Figure 

ac ac ac 

S. In both these figures, the declines in th^ unrounded difference 
curves are gradual. In the latter figure, the decline starts sooner 

because it is an easier item, which also accounts for* the larger number 

i > " 

of -20 rounded differences evident in this figure. Note that in 
contrast to Figure 5, unrounded differences in Figure 6 can be positive 
since the lower asymptote is nearly zero. 

Figures 1-6 depict the effects of deleting various items from the 
full 60-item test on the equating functions. Several effects were 
noted. The magnitude of the c-parameter constricts the range of the 
effect. Sufficiently high c-parameters preclude the occurence of 
negative differences. The location of the b-parameter determines where 
the effect occurs along the formula score range, while the a-parameter 
affects the sharpness and duration o£ the effect. In addition to the 
effects of these item parameters, we observed the effects of the 
rounding rules. In fact, in Figures 1 and 4, the rounding effects 
tended to be the dominant effects. 

Finally, the discussion of Figures 1 and 2 made it clear that from a 
psychometric viewpoint not re-equating is less desirable than 
re-equating since not re-equating allows the deleted item scoring 
decision and the deleted item's psychometric properties to have 
significant impact on the converted scores. In contrast, re-equating 
makes the scoring decision (all options correct vs. not score) 
irrelevant and attempts to mitigate the impact of the item's 
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Figure 5 

Differences in Unrounded and Rounded Equating /Scaling Functions 
for the 59-item (and 58-item) Tests Produced by Deletion of the 

Low A - High C (aC) Item (and the Circles Item) 
/ (Original Equating - Re-equating) 
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Figure 6 «\ 

Differences in Unrounded and Rounded Equating/ Scaling Functions 
for the 59-item (and 58-item) Tests Produced by Deletion of the 
Low k - Low C (ac) Item (and the Circles Item) 
(Original Equating - Re-equating) 
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psychometric properties. The results presented in the next two Sections 
clarify these points. 

Equated score distributions after re-equating. In addition to 
altering the equating function, deletion of a^itep can affect the 
formula score of an individual. There are different, item influence 
functions for each possible response to the item: correct, incorrect or 
omit. In addition, the- scoring decison has an impact. Here, we limit 

the discussion to not scoring the deleted item. Those who answered the 

I- 

deleted item correctly lose One formula score point, those who omitted 
the deleted item keep the same formula score, and those who answered ^ 
incorrectly either retain the same formula score or gain one formula 
score point. The effect on the equating function, illustrated in 
Figures 1-6, and the effects on formula scores combine to impact on 
scaled scores. For a group of individuals, these effects translate into 
a distribution of difference scores for that group. Tables 3-8 
summarize these distributions of differences resulting from deleting the 
seven items under study, and correspond to Figures 1-6, respectively. 
Each table contains nine columns, the first of which is scaled score 
difference, ranging from 30 to -20. The next four columns contain the 
absolute and relative frequencies for unrounded and rounded differences 
between individual's scaled scores on the 59-item test resulting from 
item deletion and their original score on the 60-item test. A positive 
difference indicates the new score exceeds the original score. The last 
four columns present the same data for the 58-item test produced by 
deleting the circles item from the 59-item test. At the bottom of each 
table are means, standard deviations, sample sizes and modes. 



ERLC 
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Table 3 contains the result for the most difficult item; item MD. 
Figure 1 is the corresponding figure depicting equating function 
differences. Note that over 71% of the rounded differences equal zero 
for the 59-item teist. Of the remaining 29.9%, slightly more than half 
are -10. In the unrounded difference distributions for the 59-item 
test, over 73% of the distribution is in the range -2 to 4. In 
addition, there are two other peeks at 8 to 10 and -10 to -5. Note also 
that use of the rounding rule tends to spread out differences, e.g. 
despite the fact that the maximum unrounded difference is 13, 19 people 
receive +20 over their original scaled score. This spreading is 
reflected in larger standard deviation for rounded differences. Note 
that rounding also pushes the mean farther from its original value. 

Deleting the circles item from the 59-item test has the expected 
effect of spreading scores out even more. The t-rimodal nature of the 
unrounded distribution is less apparent than it was before deleting the 
circles item. In addition, differences of -20 are produced for rounded 
scores, where only 52.7% of the difference scores are zero. The summary 
statistics at the bottom of the table are interesting. . As expected, 
deletion of the second item increases the spread of both rounded and 
unrounded difference scores. More importantly, the mean difference for 
rounded scores is pushed even further from the original mean, while the 
mean of the unrounded scores remains relatively close to the original 
mean. The impact of rounding that is evident here will be evident in 
subsequent tables and will tend, to be a relatively major factor. 

Table 4 portrays the distributional effects produced by deletion of 
the easiest item (LD) . The trimodality evident with deletion of the 
most difficult item is missing in the distribution of unrounded 
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differences in this table. Deletion of this item produces even more 
zero differences than deletion of the hard item. The mean differences, 
however are more negative for this item. When Tables 3 and 4 are placed 
side by side, it becomes apparent that rounding and the fact that an 
item is being deleted have bigger effects than do the psychometric 
properties of the item. These facts are most evident in the summary 
statistics. 

Tables 5-8 contain the results summarizing the distributional 

effects produced by deleting the AC, Ac, aC, and ac items, respectively. 

Table 6 is the most unique of these four tables. The other three tables * 

reveal that deletion 6f the AC, aC or ac item has more effect on scaled 

scores than deletion* of either the most difficult or least difficult 

item, a result consistent with the psychometric expectation that 

deletion of items of middle difficulty will have a greater effect on 

score distributions than deletion of very hard or very easy items. The 

standard deviations reported in these tables summarize this effect. 

Note, however, that the fact that items are to be deleted and scores 

rounded tend to have sizeable effects as well. In conjunction with 

Tables 1 and 2, these three tables provide evidence for the complex 

interaction of rounding rules , the act of deletion, and psychometric 

properties. In all six tables* the standard deviation of rounded 

differences for the 58-item test exceeds that of the unrounded 

differences. The same ordered relationship holds for the 59-item test. 

This consistent ordering reflects the impact of rounding. The act of 

item deletion accounts for the fact that the 58-item standard deviation 

exceed the 59-item standard deviations, which exceed zero. Finally* the 

4 

impact of psychometric characteristics is evident in the differences 
across tables in the magnitudes of the standard deviations. 
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Teble 5 
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Tabla 7 

Distributions ^£ Dlffarancat Bttvaan Ra-aquatad(RE) and Original (OS) Rounds 
and Unroundti SctUd Scorat Associated with Delation of Itaa aC (U - OS) 
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The mean differences demonstrate the interaction between rounding 
and number of items deleted. With the exception of Table 6, most mean 
.unrounded differences are close to zero. In contrast, the absolute 
value of the mean rounded differences' for the 59-item tests are the 
ra\ige .08 to .15, while those for the 58-item tests are in the range .19 
to \.84. Rounding exaggerates the item deletion effect. 

\Table 6 is the exception to the rule. Item Ac is the only item for 

which the psychometric -properties have much of an impact on mean 

unrounded differences. This, item is highly discriminating {a Ac - 1.48), 

above average in difficulty (b Ac - 1.48), and has a low c-parameter - 

<c A * .05). Recall that when discussing Figure 5, it x*as noted that 
Ac 

the shortened tests were easier than the original 60-item test for most 
of the formula score distribution. As a consequence, many more people 
were affected negatively as a Consequence of deleting this itgmthan 
were affected negatively by deletion ? of the other items. If this item 
had been of middle difficulty, the resultant ^standard deviation St 
differences would have been larger than any of those observed, and the 
mean difference of unrounded scores would have been close to zero. In 
short, the high a-parameter and low c-parameter allow th^ difficulty 
parameter to have its maximum effect. 

* To re-equate or not to re-equate. Tables 9-14 parallel Tables 3-8 
and illustrate what would happen to scaled score distributions if after 
"the flawed Items were scored all options correct, a decision was made to 
not re-equate. These six tables corift^in differences between scaled 1 * 

^ • 

scores based on re-equating and scaled scores based on using the 

* 

original conversions on the "all options correct" adjusted formula 
scores. In all six tables, all dif f eyences* rounded and unrounded, are 

Do 



non-negative, indicating that use of the original conversions with the 
"all options correct" adjusted formula scores introduces a positive 
bias, i.e., these "converted" scores are always as high or higher "than 
the appropriated converted score resulting from re-equating after 
deletion* Note that all non-negative differences are consistent with 
the upper panels in Figures 1-6* 

The extent of the positive bias clearly is related to the 
psychometric properties of the deleted item, in particular its 
difficulty level: the more difficult the deleted item,' the larger the 
positive bias. The mean differences in Tables 9-14 reflect the extent 
of positive bias. 

The final point to note in Tables 9-14 is that the decision not to 
re-equate has enabled the deleted item's psychometric properties to have 
the dominant impact on reported scores. In contrast, i;e-equating put 
the iteto's properties on a par with the arbitrary rounding effects. 
Clearly, re-equating after item deletion* is necessary from a 
psychometric viewpoint. 
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Table 9 

Distributions of Differences Between Unrounded and Rounded 
Scores Associated with Scoring Item MD All Options Correct and 
Using Original Equating(OE) vs. Re-equating(RE) : (OE - RE) 
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a Although the 59-item and 58-item tests literally are both 60-items 
long when the problem item(s) is (are) scored all options correct, 
the headings remain 59-item ana 58-item for two reasons: (1) To 
facilitate comparison of these results with those obtained when the 
deleted item(s) is (are) not scored; (2) These 60-item tests are 
figuratively 59-item and 58-item tests because individual candidate 
responses to the deleted item or items are ignored. 
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Table 10 

Distributions of Differences Between Unrounded and Rounded 
Scores Associated with Scoring Item LD All Options Correct and 
Using Original Bquating(OE) vs. Re-equating(RE) : (OB - RB) 
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a Although the 59-item and 58-item tests literally are both 60-items 
long when the problem item(s) is (are) scored all options correct, 
the headings remain 59-item and 58-item for two reasons: (1) To 
facilitate comparison of these results' with those obtained when the 
ytfeJWted item(s) is (are) not scored; (2) These 60-item tests are 
(figuratively 59-item and 58-item tests because individual candidate 
\ responses to the deleted item or items are ignored. 
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Table 11 

Distributions of Differences Between Unrounded and Rounded • 
Scores Associated with Scoring Item AC All Options Correct and 
Using Original Equating(OE) vs. Re-equating (RE) : (OE - RE) 

Item AC * Item AC 



Scaled Score 




:■• 59- 


-item a 




58- 


item* 


Difference 


unrounded 


rounded 


unrounded 


rounded 


20.000 


U 


n n 




n 
u 


O 0 


18495 40. 


19.000 


n 
u 


n n 




o 


0.0 




18.000 


u 


n n 




1 ?ft3 

X Z.O -J 


2.8 ♦ 




17.000 


U 


n n 




XUX7U 






16.000 


u 


u • u 




1 ?ft 77 






15.000 


u 


n n 






11.7 

XX*/ 




14.000 


u 


n n 






5 ft 




13.000 


0 


0.0 




1178 


2.6 




12.000 


0 


* 0.0 




2429 


5.3 




11.000 


0 


0.0 




1033 


2.3 


> 


10.000 


0 


0.0 


31351 68.5 


882 


1.9 


20900 45. 


9.000 


10314 


22.5 




1001 


2.2 




8.000 


17606 


38,5 




897 


2.0 




7.000 


5110 


11.2 




841 


1.8 




6.000 


2355 


5.1 




707 


1.5 




5.000 


2074 


4.5 




561 


1.2 




4.000 


1725 


3.8 




1228 


2.7 




3.000 


1604 


3.5 




510 


1.1 




. 2.000 


,1184 


2.6 




1030 


2.3 




1.000 


2239 


4.9 




812 


1.8 




0.0 


1548 


3.4 


14408 31.5 


448 


1.0 


6364 13. 
















N 


45759 


45759 


45759 


45759 


Mean 




6.78 


6.85 


13.56 


12.65 


S.D. 




2*54 


4.64 




4.54 


6.88 




X 



a Although the 59-item and 58-item tests literally are both 60-items 
long when the problem item(s) is (are) scored all options correct, 
the headings remain 59-item and 58-item for two reasons: (1) To 
facilitate comparison of these results with those obtained when the 
deleted item(s) is (are) not scored; (2) These 60-item tests are 
figuratively 59-item and 58-item tests because individual candidate 
responses to the deleted item or items are ignored. 
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Table 12 



Distributions of Differences Between Unrounded and Rounded 
Scores Associated with Scoring Item Ac All Options Correct and 
Using Original Equating(OE) vs. Re-equatihg(RE) : (OE - RE) 



Item Ac 



Item Ac 



Scaled Score 




59- 


-item* 




58- 


-item a 


Difference 


unrounded 


rounded 


unrounded 


rounded 


20.000 


0 


0.0 


2703 5.9 


574 


1.3 


V 

31506 68. 


19.000 


0 


0.0 




9669 


21.1 




13.000 


0 


0.0 




21481 


46.9 




17.000 


0 


0.0 




3483 


7.6 




16.000 


0 


0.0 




1982 


4.3 




15.000 


o 


0.0 




1893 


4. 1 




14.000 


0 


0.0 




1584 


3.5 




13.000 


0 


0.0 




622 


1.4« 


- 


12.000 


0 


0.0 


• - 


689 


1.5 




11.000 


17646 


38.6 




1*122 


2.5 




10.000 


23301 


50.9 


41872 "91.5 


424 


0.9 


13331 29. 


9.000 


1789 


3.9 




390 


0.9 




8.000 


879 


1.9 




397 


0.9 




7.000 


748 


1.6 




313 


0.* 




6.000 


302 


0.7 




242 


0.5 




5.000 


349 


0.8 




122 


0.3 




4.000 


233 


0.5 




247 


0.5 




3.000 


194 


0.4 




202 


0.4 




2.000 


140 


0.3 




137 


0.3 




1.000 


94 


0.2 




102 


0.2 




0.0 


84 


0.2 


1184 2.6 


84 


0,2 


922 2. 


N 


45759 


45759 


45759 


45759 


Mean 


10.07 


10.33 


16.81 


16.68 


S.D. 




1.37 


2.90 




3.22 


5.12 



a Although, the 59-item and 58-item tests literally are both 60-items 
long when the problem item(s) is (are) scored all options correct, 
the headings remain 59-item and 58-item for two reasons: (1) To 
facilitate comparison of these results with those obtained when the 
deleted item(s) is (are) not scored; (2) These 60-item tests are 
figuratively 59-item and 58-item tests because individual candidate 
responses to the deleted item or items are ignored. 
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Table 13 

Distributions of Differences Between Unrounded and Rounded 
Scores Associated with Scoring item aC All Options Correct and 
Squatinfc(OE) vs. Re-equating(RE) : (OE - RE) 



Using Original Equating(OE) vs. 

Item aC 



I ten aC 



Scaled Score 




59-1 tea? 




58- 


-item* 


Difference 


unrounded 


rounded 


unrounded 


rounded 


20.000 


0 


0.0 




n 
u 




14655 32.1 


19.000 


0 


0.0 




n 
u 


Ci .0 




18.000 


0 


0.0 




n 
u 




V 


17 .000 


0 


0.0 




O 1 ft ft 


L ft 


) 


16.000 


0 


0.0 




T /,/,£ 

/ *l*IO 


10 . J 




15.000 


o 


0.0 




6759 


14.8 




14.000 


0 


0.0 




11956 


26.1 




13.000 


0 


0.0 




5065 


11.1 




12,000 


0 


0.0 




2312 


5.1 




11.000 


0 


0.0 




1946 


4.3 


27990 61. 


10.000 


0 


0.0 


28 4 35 6 2.1 


1817 


4.0 


9.000 


0 


0.0 




816 


1.8 




8.000 


6685 


14.6 




1289 


2.8 




7.000 


14771 


32.3 




1239 


2.7 




6.000 


15948 


34.9 




861 


1.9 




5.000 


4424 


9.7 




735 


1.6 




4.000 


2312 


5.1 




545 


1.2 




3.000 


1143 


2.5 




493 


/ 1.1 




2.000 


354 


0.8 




168 


0.4 




1.000 


96 


0.2 




98 


0.2 


3114 6. 


0.0 


26 


0.1 


17324 37.9 


26 


0.1 


N 


45759 


45759 


45759 


45759 


Mean 




6.30 


6.21 




13.06 


12.52 


S.D. 




1.24 


4.85 




3.23 


5.70 



^Although the 59-item and 58-item tests literally are both 60-iteos 
long when the problem Item(s) Is (are) scored all options correct » 
the headings remain 59-item ahd 58-item for two reasons: (1) To 

* facilitate comparison of these results with those obtained when the 
deleted item(s) is (are) not scored; (2) These 60-item tests are 
figuratively 59-item and 58-item tests because individual candidate 
responses to the deleted item or items are ignored. 
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Table 14 

Distributions of Differences Between Unrounded and Rounded 
Scores Associated with Scoring Item ac All Options Correct and 
Using Original Equating(OE) vs. Re-equating(RE) : (OE, - RE) 

Item ac Item ac 



Scaled Score 




59- 


-item* 




58- 


-itea* 


Difference 


unrounded 


rounded 


unrounded 


rounded 


20.000 


n 






o 


0.0 


14136 30. 


19.000 


n 


0.0 




674 


1.5 




18.000 




0.0 




1479 


3.2 




17.000 




0-0 




31 15 


6.8 




16.000 


o 


O 0 




3482 


7 




15.000 


0 


0.0 




4196 


9.2 




14.000 


0 


. 0.0 




5000 


10.9 




13.000 


0 


0.0 




6763 


14.8 




1 £ • uuu 


0 


0.0 




5297 


11.6 




11.000 


0 


0.0 




3620 


7.9 




10.000 


953 


2.1 


25092 54.8 


2337 


5.1 


26046 56. 


9.000 


2174 


4.8 




1869 


4.1 




8.000 


4532 


9.9 




960 


2.1 




7.000 


5126 


11.2 




1668 


3.6 




6.000 


7593 


16.6 




1271 


2.8 




5.000 


9287 


20.3 




1210 


2.6 




4.000 


7899 


17.3 




824 


1.8 




3.000 


4383 


9.6 




726 


1,6 




2.000 


2581 


5.6 




800 


1.7 




1.000 


1157 


2.5 




397 


0.9 


\ 


0.0 


74 


0.2 


20667 45.2 


71 


0.2 


5577 12. 


N 


45759 


45759 


45759 


45759 


Mean 




5.34 


5.48 




12.09 


11.87 


S.D. 




2.04 


4.98 




3.98 


6.29 



a Although the 59-item and 58-item tests literally are both 60-items 
long when the problem item(s) is (are) scored all options correct, 
the headings remain 59-item and 58-item for two reasons: (1) To 
facilitate comparison of these results with those obtained when the 
deleted item(s) is (are) not scored; (2), These 60-item tests are 
figuratively 59-item and 58-item tests because individual candidate 

^responses to the deleted item or items are ignored. 
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Conclusions 

This report hais presented a formal analysis of the effects of item 
deletion on equating/scaling functions and on reported score 
distributions. The analysis, based op item response theory, was used to 
decompose the item deletion effect into its constituent elements. This 
analysis was supplemented by empirical illustrations drawn from the May 
1982 administration of the SAT-Mathematical test that contained the 
circles item. ^ 

The item deletion effect can be separated into several components. 
Deletion introduces changes in the equating function that maps formula 
scores onto the reported score scale. The psychometric characterises 
of item and rounding rules for scaled scores contribute to the change in 
equating function. An item's difficulty detC«yiines where the change in 
equating function occurs along the formula score continuum. Deletion of 
a very difficult item can have no substantial effect on the equating 
function when the item is not scored. Deletion of an easy item under 
the not score condition, however, can have a very noticeable effect. In 
Contra$t, scoring the item all options correct makes deletion of the 
easy item essentially transparent and deletion of the hard item quite 
noticeable. An item's discriminating power determines the abruptness 
and direction of the effect. Deletion of a highly discriminating item 
produces an abrupt change in the equating function near the item's 
difficulty parameter. In contrast, deletion of a poorly discriminating 
item produces a gradual shift that affects more of the scores centered 
around the item's difficulty level. Finally, the item's susceptibility 
to guessing modulates the* effect. Deleting an item with a high lower 
asymptote precludes the occurrence of positive differences in equating 

6 j 
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functions under the not score condition.* Deletion of an item with a 
very low lower asymptote will yield positive differences, the number of 

which increase with the difficulty and discriminating power of the item. 

i 

The illustrative data demonstrated the importance of rounding rules 
for scaled scores. While the formal analysis referred to the impact of 
the rules, the illustrations vividly portrayed their impact. In many 
cases under re-equating^ rounding has a large if not*a larger effect on 
reported scores than the psycWometric properties of k the item. The same 
can be said for the act of item deletion itself. Deleting an item in 
general will have an effect on reported score distributions, a greater 
effect, in fact, than the particular psychometric properties of the 
deleted item, provided that re-equating is performed. 

The reason that the psychometric properties of the item tend to have 
a smaller effect on reported score distribution differences than either 
rounding the scaled scores or the act of item deletion is re-equating. 
Re-equating, particularly via item response theory, compensates for the 
loss of the deleted item's psychometric properties. As a cdrollary, 
deletion without re-equating allows the deleted item's properties to 
have a more substantial impact. This fact explains why re-equating is 
psychometrically desirable after an item is deleted. 
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Reference Notes 



1. Petersen, N. S. Effects of not scoring Math I item 17 on SAT-M Form 
3ESA05. Unpublished memorandum, June 16, 1982. 

2. Petersen, N. S. Effect on scores of rescoring items 62 and 63 in 
Biology Form XAC and re-equating. Unpublished memorandum, 
September 23, 1982. 

3. Petersen, N. S. Effect on scores of giving everyone credit on items 
62 «tind 63 in Biology Form XAC. Unpublished memorandum, 

September 27, 1982. 

4. Wainer, H. The item influence function: A strategy for dealing 
with unusual items. Unpublished manuscript, 1981. 
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